scalable mip-based method
A Scalable MIP-based Method for Learning Optimal Multivariate Decision Trees
Several recent publications report advances in training optimal decision trees (ODTs) using mixed-integer programs (MIPs), due to algorithmic advances in integer programming and a growing interest in addressing the inherent suboptimality of heuristic approaches such as CART. In this paper, we propose a novel MIP formulation, based on 1-norm support vector machine model, to train a binary oblique ODT for classification problems. We further present techniques, such as cutting planes, to tighten its linear relaxation, to improve run times to reach optimality. Using 36 datasets from the University of California Irvine Machine Learning Repository, we demonstrate that our training approach outperforms its counterparts from literature in terms of out-of-sample performance (around 10% improvement in mean out-of-sample testing accuracy). Towards our goal of developing a scalable framework to train multivariate ODT on large datasets, we propose a new linear programming based data selection method to choose a subset of the data, and use it to train a decision tree through our proposed MIP model. We conclude this paper with extensive numerical testing results, that showcase the generalization performance of our new MIP formulation, and the improvement in mean out-of-sample accuracy on large datasets.
Review for NeurIPS paper: A Scalable MIP-based Method for Learning Optimal Multivariate Decision Trees
Clarity: The main paper is mostly written fairly well, the Appendix less so (lots of typos at least). The work nevertheless lacks clarity because several relevant details are moved to the Supplementary part, and some aspects are not mentioned at all (at least in the main paper). The Appendix even contains a section regarding categorical features that is not even hinted at in the main paper. Clarification is needed, e.g., at the following points: - p.2, l.70-73 is too vague, the meaning is unclear - pls. clarify - p.2, l. 85f: clarify what "[...] i enters leaf node l " means (i.e., that data pt. If \hat{y}_i denotes a predicted label, then why is it real-valued and not in [Y]? (Also regarding the description on p.3, l.96f: why should y_i - \hat{y}_i \geq 1 here -- \hat{y}_i is in R, so couldn't it be, say, y_i - delta for some small delta?) - p.3, l.92: perhaps clarify "tree sparsity" -- actually here this means sparsity of the decision hyperplanes, no the tree itself - The 1-norm is used in the MIP (1) and several times in the text later called "linear" (e.g., p.4, l.136), but this is technically incorrect.
Review for NeurIPS paper: A Scalable MIP-based Method for Learning Optimal Multivariate Decision Trees
This paper is about employing advances in computational efficiency of mixed integer programming methods towards decision tree construction problems. While locally optimal methods can achieve an upper bound on the minimization problem efficiently, closing the optimality gap requires tight lower bounds. The authors use an interval relaxation and a support-vector machine procedure to tighten the lower bound. To scale the algorithm, the authors use a LP-based data selection procedure, and perform all experiments using this procedure. It is not clear whether the global optimality properties of the MIP formulation carry through with the data-selection procedure.
- Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (0.66)
- Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.66)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.66)
A Scalable MIP-based Method for Learning Optimal Multivariate Decision Trees
Several recent publications report advances in training optimal decision trees (ODTs) using mixed-integer programs (MIPs), due to algorithmic advances in integer programming and a growing interest in addressing the inherent suboptimality of heuristic approaches such as CART. In this paper, we propose a novel MIP formulation, based on 1-norm support vector machine model, to train a binary oblique ODT for classification problems. We further present techniques, such as cutting planes, to tighten its linear relaxation, to improve run times to reach optimality. Using 36 datasets from the University of California Irvine Machine Learning Repository, we demonstrate that our training approach outperforms its counterparts from literature in terms of out-of-sample performance (around 10% improvement in mean out-of-sample testing accuracy). Towards our goal of developing a scalable framework to train multivariate ODT on large datasets, we propose a new linear programming based data selection method to choose a subset of the data, and use it to train a decision tree through our proposed MIP model.
- Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (0.89)
- Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.89)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.62)